A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability - Practical Assessment, Research & Evaluation
نویسنده
چکیده
This article argues that the general practice of describing interrater reliability as a single, unified concept is at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different statistical methods for computing interrater reliability can be more accurately classified into one of three categories based upon the underlying goals of analysis. The three general categories introduced and described in this paper are: 1) consensus estimates, 2) consistency estimates, and 3) measurement estimates. The assumptions, interpretation, advantages, and disadvantages of estimates from each of these three categories are discussed, along with several popular methods of computing interrater reliability coefficients that fall under the umbrella of consensus, consistency, and measurement estimates. Researchers and practitioners should be aware that different approaches to estimating interrater reliability carry with them different implications for how ratings across multiple judges should be summarized, which may impact the validity of subsequent study results.
منابع مشابه
Reliability: on the reproducibility of assessment data.
CONTEXT All assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted. PURPOSE The purpose of this paper is to discuss applications of reliability to the most common assessment methods in medical education. Typical methods of estimating reliability are discussed intuitively and non-mathematically. SUMMARY Reliability refers to the...
متن کاملA new brief instrument for assessing decisional capacity for clinical research.
CONTEXT There is a critical need for practical measures for screening and documenting decisional capacity in people participating in different types of clinical research. However, there are few reliable and validated brief tools that could be used routinely to evaluate individuals' capacity to consent to a research protocol. OBJECTIVE To describe the development, testing, and proposed use of ...
متن کاملReliability of scores on the Stroke Rehabilitation Assessment of Movement (STREAM) measure.
BACKGROUND AND PURPOSE The Stroke Rehabilitation Assessment of Movement (STREAM) is a new clinical measurement tool for evaluating the recovery of voluntary movement and basic mobility following stroke. This article presents the results of 3 substudies examining the reliability (interrater and intrarater) and internal consistency of STREAM scores. SUBJECTS AND METHODS A "direct-observation re...
متن کاملAssessment of interrater and intrarater reliability in the evaluation of metered dose inhaler technique.
STUDY OBJECTIVE To determine if a training session using videotaped metered dose inhaler (MDI) performances can result in high interrater and intrarater reliability of five evaluators assessing MDI technique. DESIGN Five evaluators (three pharmacists, two pulmonary fellows) were trained to evaluate MDI technique during a 2-h training session. The training session consisted of verbal instructi...
متن کاملEvaluation of fatigue scales in stroke patients.
BACKGROUND AND PURPOSE There is little information on how to best measure poststroke fatigue. Our aim was to identify which currently available fatigue scale is most valid, feasible, and reliable in stroke patients. METHODS Fatigue scales were identified by systematic search, and the 5 with the best face validity were identified by expert consensus. Feasibility (ie, did patients provide answe...
متن کامل